import pandas as pd
import numpy as np
import plotly.express as px
df = pd.read_csv('data.csv')
df.head()
| customer_id | first_hf_delivery_week | hf_delivery_week | intial_nps_score | NPSgroup | channel | Box_type | Box_size | number_of_recipes | State | average_recipe_rating | status | total_boxes_sold | total_revenue | total_profit_margin | total_cost | total_discount | total_profit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 189213 | 2016-W07 | 2019-W35 | 8 | Passives | Non_Freebies | Classic_box | 2 | 3 | Victoria | NaN | Cancelled | 3 | 248.40 | 72.636 | 175.764 | 50.0 | 22.636 |
| 1 | 200203 | 2016-W09 | 2019-W35 | 8 | Passives | Non_Freebies | Family_box | 4 | 4 | New South Wales | 3.500000 | Cancelled | 50 | 8249.64 | 2621.856 | 5627.784 | 100.0 | 2521.856 |
| 2 | 239767 | 2016-W17 | 2019-W35 | 0 | detractors | Non_Freebies | Classic_box | 2 | 3 | Victoria | NaN | Cancelled | 1 | 82.80 | 24.456 | 58.344 | 50.0 | -25.544 |
| 3 | 248559 | 2016-W19 | 2019-W35 | 2 | detractors | Non_Freebies | Family_box | 4 | 4 | New South Wales | 2.666667 | Cancelled | 2 | 333.60 | 98.760 | 234.840 | 278.0 | -179.240 |
| 4 | 271638 | 2016-W24 | 2019-W35 | 8 | Passives | Non_Freebies | Family_box | 4 | 4 | South Australia | 2.000000 | Cancelled | 3 | 501.54 | 150.684 | 350.856 | 120.0 | 30.684 |
df.shape
(80000, 18)
df['hf_delivery_week'].value_counts()
2019-W35 80000 Name: hf_delivery_week, dtype: int64
df.nunique()
customer_id 80000 first_hf_delivery_week 194 hf_delivery_week 1 intial_nps_score 11 NPSgroup 3 channel 2 Box_type 3 Box_size 2 number_of_recipes 3 State 7 average_recipe_rating 998 status 3 total_boxes_sold 179 total_revenue 18365 total_profit_margin 40285 total_cost 41566 total_discount 7932 total_profit 53255 dtype: int64
df['first_week_of_purchase'] = df['first_hf_delivery_week'] == df['hf_delivery_week']
df['first_week_of_purchase'] = df['first_week_of_purchase'].astype(int)
df[['year', 'week']] = df['first_hf_delivery_week'].str.split('-W', expand=True).rename(columns={0:'year', 1:'week'})
df[['year', 'week']] = df[['year', 'week']].astype(int)
df
| customer_id | first_hf_delivery_week | hf_delivery_week | intial_nps_score | NPSgroup | channel | Box_type | Box_size | number_of_recipes | State | ... | status | total_boxes_sold | total_revenue | total_profit_margin | total_cost | total_discount | total_profit | first_week_of_purchase | year | week | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 189213 | 2016-W07 | 2019-W35 | 8 | Passives | Non_Freebies | Classic_box | 2 | 3 | Victoria | ... | Cancelled | 3 | 248.400 | 72.636 | 175.764 | 50.0 | 22.636 | 0 | 2016 | 7 |
| 1 | 200203 | 2016-W09 | 2019-W35 | 8 | Passives | Non_Freebies | Family_box | 4 | 4 | New South Wales | ... | Cancelled | 50 | 8249.640 | 2621.856 | 5627.784 | 100.0 | 2521.856 | 0 | 2016 | 9 |
| 2 | 239767 | 2016-W17 | 2019-W35 | 0 | detractors | Non_Freebies | Classic_box | 2 | 3 | Victoria | ... | Cancelled | 1 | 82.800 | 24.456 | 58.344 | 50.0 | -25.544 | 0 | 2016 | 17 |
| 3 | 248559 | 2016-W19 | 2019-W35 | 2 | detractors | Non_Freebies | Family_box | 4 | 4 | New South Wales | ... | Cancelled | 2 | 333.600 | 98.760 | 234.840 | 278.0 | -179.240 | 0 | 2016 | 19 |
| 4 | 271638 | 2016-W24 | 2019-W35 | 8 | Passives | Non_Freebies | Family_box | 4 | 4 | South Australia | ... | Cancelled | 3 | 501.540 | 150.684 | 350.856 | 120.0 | 30.684 | 0 | 2016 | 24 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 79995 | 712308 | 2017-W47 | 2019-W35 | 7 | Passives | Non_Freebies | Classic_box | 2 | 3 | Victoria | ... | Cancelled | 4 | 335.760 | 153.756 | 182.004 | 160.0 | -6.244 | 0 | 2017 | 47 |
| 79996 | 722199 | 2017-W48 | 2019-W35 | 1 | detractors | Non_Freebies | Classic_box | 2 | 3 | Western Australia | ... | Cancelled | 2 | 189.588 | 84.048 | 105.540 | 100.0 | -15.952 | 0 | 2017 | 48 |
| 79997 | 726595 | 2019-W18 | 2019-W35 | 10 | Promoters | Freebies | Family_box | 4 | 3 | Western Australia | ... | Cancelled | 1 | 142.800 | 69.132 | 73.668 | 219.9 | -150.768 | 0 | 2019 | 18 |
| 79998 | 737585 | 2017-W50 | 2019-W35 | 8 | Passives | Freebies | Classic_box | 4 | 5 | New South Wales | ... | Cancelled | 1 | 227.940 | 102.132 | 125.808 | 379.9 | -277.768 | 0 | 2017 | 50 |
| 79999 | 754070 | 2017-W51 | 2019-W35 | 9 | Promoters | Freebies | Classic_box | 4 | 3 | Victoria | ... | Cancelled | 1 | 149.940 | 74.256 | 75.684 | 249.9 | -175.644 | 0 | 2017 | 51 |
80000 rows × 21 columns
df['first_week_of_purchase'].value_counts() / len(df)
0 0.990812 1 0.009187 Name: first_week_of_purchase, dtype: float64
df.groupby(['State', 'first_week_of_purchase']).mean()
| customer_id | intial_nps_score | Box_size | number_of_recipes | average_recipe_rating | total_boxes_sold | total_revenue | total_profit_margin | total_cost | total_discount | total_profit | year | week | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| State | first_week_of_purchase | |||||||||||||
| Australian Capital Territory | 0 | 9.693173e+05 | 7.605027 | 2.688211 | 3.688211 | 3.352316 | 11.861759 | 1308.792151 | 568.127598 | 740.664553 | 232.286990 | 335.840608 | 2017.574506 | 24.613405 |
| 1 | 2.227973e+06 | 8.909091 | 2.727273 | 4.000000 | 3.722222 | 1.000000 | 133.009091 | 67.105091 | 65.904000 | 168.845455 | -101.740364 | 2019.000000 | 35.000000 | |
| New South Wales | 0 | 1.096961e+06 | 7.980523 | 2.759645 | 3.715367 | 3.409994 | 10.457076 | 1222.546029 | 544.991014 | 677.555015 | 232.130978 | 312.860036 | 2017.816161 | 25.087389 |
| 1 | 2.100574e+06 | 8.444444 | 3.200000 | 3.905556 | 3.656764 | 1.000000 | 148.859667 | 75.104067 | 73.755600 | 194.854111 | -119.750044 | 2019.000000 | 35.000000 | |
| Northern Territory | 0 | 9.642148e+05 | 7.949458 | 2.729242 | 3.772563 | 3.363270 | 13.249097 | 1585.041632 | 726.098888 | 858.942744 | 252.826980 | 473.271909 | 2017.748496 | 23.880866 |
| 1 | 2.212293e+06 | 8.500000 | 2.000000 | 4.000000 | NaN | 1.000000 | 125.940000 | 63.540000 | 62.400000 | 93.980000 | -30.440000 | 2019.000000 | 35.000000 | |
| Queensland | 0 | 1.147509e+06 | 8.094637 | 2.811336 | 3.858683 | 3.436163 | 9.464240 | 1149.711609 | 519.681870 | 630.029739 | 241.053932 | 278.627938 | 2017.916644 | 25.543873 |
| 1 | 2.082502e+06 | 8.086207 | 2.827586 | 4.011494 | 3.621987 | 1.005747 | 139.306897 | 70.284345 | 69.022552 | 177.533218 | -107.248874 | 2019.000000 | 35.000000 | |
| South Australia | 0 | 1.030991e+06 | 7.910171 | 2.719883 | 3.761984 | 3.387126 | 10.395373 | 1174.516014 | 518.785013 | 655.731001 | 245.094865 | 273.690148 | 2017.692580 | 25.748645 |
| 1 | 1.975455e+06 | 8.107143 | 3.000000 | 3.964286 | 3.462963 | 1.000000 | 143.950714 | 72.626571 | 71.324143 | 192.372857 | -119.746286 | 2019.000000 | 35.000000 | |
| Victoria | 0 | 1.169107e+06 | 7.991698 | 2.766948 | 3.720574 | 3.417914 | 9.522647 | 1091.895677 | 492.469510 | 599.426167 | 230.705671 | 261.763839 | 2017.933337 | 25.530065 |
| 1 | 2.132970e+06 | 7.760000 | 2.924444 | 3.857778 | 3.607332 | 1.000000 | 138.053867 | 69.651680 | 68.402187 | 170.074933 | -100.423253 | 2019.000000 | 35.000000 | |
| Western Australia | 0 | 1.381088e+06 | 8.162401 | 2.846944 | 3.808301 | 3.462456 | 7.708621 | 1004.500203 | 477.170303 | 527.329899 | 239.536989 | 237.633314 | 2018.324634 | 26.888365 |
| 1 | 2.018081e+06 | 7.732558 | 3.093023 | 3.813953 | 3.599106 | 1.000000 | 150.610465 | 75.990419 | 74.620047 | 171.844651 | -95.854233 | 2019.000000 | 35.000000 |
df.groupby(['first_hf_delivery_week']).mean()
| customer_id | intial_nps_score | Box_size | number_of_recipes | average_recipe_rating | total_boxes_sold | total_revenue | total_profit_margin | total_cost | total_discount | total_profit | first_week_of_purchase | year | week | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| first_hf_delivery_week | ||||||||||||||
| 2016-W02 | 1.725828e+05 | 7.927536 | 2.144928 | 3.434783 | 3.036749 | 25.101449 | 2354.606957 | 833.924522 | 1520.682435 | 199.226377 | 634.698145 | 0.0 | 2016.0 | 2.0 |
| 2016-W03 | 1.750740e+05 | 7.870968 | 2.236559 | 3.408602 | 2.975063 | 26.247312 | 2564.519355 | 929.021161 | 1635.498194 | 225.977849 | 703.043312 | 0.0 | 2016.0 | 3.0 |
| 2016-W04 | 1.785012e+05 | 7.224719 | 2.404494 | 3.561798 | 3.289875 | 22.382022 | 2311.013258 | 839.173753 | 1471.839506 | 285.143146 | 554.030607 | 0.0 | 2016.0 | 4.0 |
| 2016-W05 | 1.810846e+05 | 6.893617 | 2.404255 | 3.361702 | 3.007300 | 26.808511 | 3145.912340 | 1148.485915 | 1997.426426 | 240.106809 | 908.379106 | 0.0 | 2016.0 | 5.0 |
| 2016-W06 | 1.848043e+05 | 6.769231 | 2.593407 | 3.450549 | 3.147580 | 17.989011 | 1889.729011 | 649.697934 | 1240.031077 | 216.808352 | 432.889582 | 0.0 | 2016.0 | 6.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2019-W35 | 2.094520e+06 | 8.055782 | 2.987755 | 3.906122 | 3.614136 | 1.002721 | 143.044816 | 72.170514 | 70.874302 | 178.595673 | -106.425159 | 1.0 | 2019.0 | 35.0 |
| 2019-W36 | 2.127101e+06 | 8.208333 | 2.977011 | 4.002874 | 3.618878 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 2019.0 | 36.0 |
| 2019-W37 | 2.157727e+06 | 7.981735 | 2.986301 | 3.949772 | 3.610970 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 2019.0 | 37.0 |
| 2019-W38 | 2.230724e+06 | 6.850000 | 3.100000 | 3.700000 | 3.583333 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 2019.0 | 38.0 |
| 2019-W40 | 1.679323e+06 | 10.000000 | 4.000000 | 5.000000 | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 2019.0 | 40.0 |
194 rows × 14 columns
df.groupby(['NPSgroup']).mean()
| customer_id | intial_nps_score | Box_size | number_of_recipes | average_recipe_rating | total_boxes_sold | total_revenue | total_profit_margin | total_cost | total_discount | total_profit | first_week_of_purchase | year | week | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NPSgroup | ||||||||||||||
| Passives | 1.128452e+06 | 7.636063 | 2.805558 | 3.704531 | 3.340167 | 9.467669 | 1102.455352 | 492.320388 | 610.134964 | 233.731315 | 258.589073 | 0.007842 | 2017.859993 | 25.549436 |
| Promoters | 1.219471e+06 | 9.753441 | 2.779174 | 3.837707 | 3.542568 | 11.317705 | 1355.100476 | 617.195749 | 737.904728 | 247.060613 | 370.135136 | 0.009851 | 2018.031326 | 25.961377 |
| detractors | 1.101451e+06 | 3.514104 | 2.770274 | 3.656564 | 2.982682 | 4.377407 | 511.674902 | 226.991486 | 284.683417 | 202.012876 | 24.978609 | 0.009222 | 2017.818280 | 25.179346 |
df2 = df[df['year'] >= 2018]
px.scatter(df2.dropna(subset=['State']), x='total_revenue', y='total_profit', color='State')
df.isnull().sum()
customer_id 0 first_hf_delivery_week 0 hf_delivery_week 0 intial_nps_score 0 NPSgroup 0 channel 0 Box_type 0 Box_size 0 number_of_recipes 0 State 355 average_recipe_rating 52071 status 0 total_boxes_sold 0 total_revenue 0 total_profit_margin 0 total_cost 0 total_discount 0 total_profit 0 first_week_of_purchase 0 year 0 week 0 dtype: int64
df2.columns = ['customer_id', 'first_hf_delivery_week', 'hf_delivery_week',
'intial_nps_score', 'NPSgroup', 'channel', 'Box_type', 'Box_size',
'number_of_recipes', 'State', 'average_recipe_rating', 'status',
'total_boxes_sold', 'total_revenue', 'total_profit_margin',
'total_cost', 'total_discount', 'total_profit',
'first_week_of_purchase', 'year', 'week']
df2018 = df2[(df2['week'] <= 35) & (df2['year']==2018)].copy()
df2019 = df2[(df2['week'] <= 35) & (df2['year']==2019)].copy()
df2018.sort_values(['year', 'week'], inplace=True)
df2019.sort_values(['year', 'week'], inplace=True)
df2018.columns
Index(['customer_id', 'first_hf_delivery_week', 'hf_delivery_week',
'intial_nps_score', 'NPSgroup', 'channel', 'Box_type', 'Box_size',
'number_of_recipes', 'State', 'average_recipe_rating', 'status',
'total_boxes_sold', 'total_revenue', 'total_profit_margin',
'total_cost', 'total_discount', 'total_profit',
'first_week_of_purchase', 'year', 'week'],
dtype='object')
df2018_agg = df2018.groupby('week', as_index=False).agg({'customer_id':'count', 'total_profit':'mean'})
df2019_agg = df2019.groupby('week', as_index=False).agg({'customer_id':'count', 'total_profit':'mean'})
df2018_agg.rename(columns={'customer_id': 'user_acquisitions_2018', 'total_profit': 'total_profit_2018'}, inplace=True)
df2019_agg.rename(columns={'customer_id': 'user_acquisitions_2019', 'total_profit': 'total_profit_2019'}, inplace=True)
agg = pd.merge(df2018_agg, df2019_agg, on='week')
agg.head()
| week | user_acquisitions_2018 | total_profit_2018 | user_acquisitions_2019 | total_profit_2019 | |
|---|---|---|---|---|---|
| 0 | 1 | 146 | 360.768521 | 279 | 176.044387 |
| 1 | 2 | 268 | 705.135478 | 610 | 236.873757 |
| 2 | 3 | 355 | 577.652428 | 593 | 254.710725 |
| 3 | 4 | 354 | 579.515379 | 666 | 230.883339 |
| 4 | 5 | 422 | 493.488919 | 675 | 204.715135 |
for col in ['user_acquisitions_2018', 'total_profit_2018', 'user_acquisitions_2019', 'total_profit_2019']:
agg[col+'_cumsum'] = agg[col].cumsum()
agg.head(20)
| week | user_acquisitions_2018 | total_profit_2018 | user_acquisitions_2019 | total_profit_2019 | user_acquisitions_2018_cumsum | total_profit_2018_cumsum | user_acquisitions_2019_cumsum | total_profit_2019_cumsum | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 146 | 360.768521 | 279 | 176.044387 | 146 | 360.768521 | 279 | 176.044387 |
| 1 | 2 | 268 | 705.135478 | 610 | 236.873757 | 414 | 1065.903998 | 889 | 412.918144 |
| 2 | 3 | 355 | 577.652428 | 593 | 254.710725 | 769 | 1643.556426 | 1482 | 667.628870 |
| 3 | 4 | 354 | 579.515379 | 666 | 230.883339 | 1123 | 2223.071805 | 2148 | 898.512209 |
| 4 | 5 | 422 | 493.488919 | 675 | 204.715135 | 1545 | 2716.560724 | 2823 | 1103.227344 |
| 5 | 6 | 429 | 563.853268 | 618 | 184.853430 | 1974 | 3280.413992 | 3441 | 1288.080774 |
| 6 | 7 | 525 | 463.813882 | 690 | 183.046597 | 2499 | 3744.227874 | 4131 | 1471.127371 |
| 7 | 8 | 502 | 433.262733 | 634 | 167.175590 | 3001 | 4177.490607 | 4765 | 1638.302961 |
| 8 | 9 | 614 | 469.407792 | 671 | 151.718903 | 3615 | 4646.898399 | 5436 | 1790.021864 |
| 9 | 10 | 651 | 322.870421 | 861 | 146.019396 | 4266 | 4969.768820 | 6297 | 1936.041260 |
| 10 | 11 | 792 | 366.428742 | 704 | 166.128216 | 5058 | 5336.197562 | 7001 | 2102.169476 |
| 11 | 12 | 853 | 363.148938 | 867 | 159.145689 | 5911 | 5699.346500 | 7868 | 2261.315165 |
| 12 | 13 | 516 | 376.741744 | 831 | 142.823533 | 6427 | 6076.088244 | 8699 | 2404.138698 |
| 13 | 14 | 436 | 377.581119 | 558 | 146.942301 | 6863 | 6453.669363 | 9257 | 2551.080999 |
| 14 | 15 | 529 | 447.077422 | 622 | 96.611794 | 7392 | 6900.746785 | 9879 | 2647.692793 |
| 15 | 16 | 331 | 682.898840 | 600 | 107.116360 | 7723 | 7583.645625 | 10479 | 2754.809153 |
| 16 | 17 | 513 | 422.432483 | 591 | 78.839019 | 8236 | 8006.078108 | 11070 | 2833.648172 |
| 17 | 18 | 551 | 513.125321 | 660 | 113.333818 | 8787 | 8519.203430 | 11730 | 2946.981990 |
| 18 | 19 | 499 | 370.117523 | 682 | 108.582979 | 9286 | 8889.320953 | 12412 | 3055.564969 |
| 19 | 20 | 871 | 336.624427 | 642 | 108.715938 | 10157 | 9225.945380 | 13054 | 3164.280907 |
px.bar(agg, x='week', y=['user_acquisitions_2018', 'user_acquisitions_2019'], barmode='group', title='First purchase week of customers who purchased on W35 of 2019')
px.bar(agg, x='week', y=['user_acquisitions_2018_cumsum', 'user_acquisitions_2019_cumsum'], barmode='group', title='First purchase week of customers who purchased on W35 of 2019 (cumulative sum)')
agg['user_acquisitions_YoYchange'] = (agg['user_acquisitions_2019'] - agg['user_acquisitions_2018']) / agg['user_acquisitions_2018']
px.bar(agg, x='week', y='user_acquisitions_YoYchange')
import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)
agg[['week', 'user_acquisitions_YoYchange']].style.background_gradient(cmap=cm)
| week | user_acquisitions_YoYchange | |
|---|---|---|
| 0 | 1 | 0.910959 |
| 1 | 2 | 1.276119 |
| 2 | 3 | 0.670423 |
| 3 | 4 | 0.881356 |
| 4 | 5 | 0.599526 |
| 5 | 6 | 0.440559 |
| 6 | 7 | 0.314286 |
| 7 | 8 | 0.262948 |
| 8 | 9 | 0.092834 |
| 9 | 10 | 0.322581 |
| 10 | 11 | -0.111111 |
| 11 | 12 | 0.016413 |
| 12 | 13 | 0.610465 |
| 13 | 14 | 0.279817 |
| 14 | 15 | 0.175803 |
| 15 | 16 | 0.812689 |
| 16 | 17 | 0.152047 |
| 17 | 18 | 0.197822 |
| 18 | 19 | 0.366733 |
| 19 | 20 | -0.262916 |
| 20 | 21 | -0.186131 |
| 21 | 22 | -0.171958 |
| 22 | 23 | -0.189222 |
| 23 | 24 | -0.279343 |
| 24 | 25 | -0.158482 |
| 25 | 26 | 1.091195 |
| 26 | 27 | 0.413793 |
| 27 | 28 | -0.174373 |
| 28 | 29 | -0.215484 |
| 29 | 30 | -0.164474 |
| 30 | 31 | -0.215854 |
| 31 | 32 | -0.297710 |
| 32 | 33 | -0.253136 |
| 33 | 34 | 0.314394 |
| 34 | 35 | 0.314848 |
agg.sum()
week 630.000000 user_acquisitions_2018 21332.000000 total_profit_2018 14127.875386 user_acquisitions_2019 23204.000000 total_profit_2019 3207.238147 user_acquisitions_2018_cumsum 338502.000000 total_profit_2018_cumsum 280632.662484 user_acquisitions_2019_cumsum 409580.000000 total_profit_2019_cumsum 89777.523549 user_acquisitions_YoYchange 7.837417 dtype: float64
agg.mean()
week 18.000000 user_acquisitions_2018 609.485714 total_profit_2018 403.653582 user_acquisitions_2019 662.971429 total_profit_2019 91.635376 user_acquisitions_2018_cumsum 9671.485714 total_profit_2018_cumsum 8018.076071 user_acquisitions_2019_cumsum 11702.285714 total_profit_2019_cumsum 2565.072101 user_acquisitions_YoYchange 0.223926 dtype: float64
No brainer:
Based on users that purchased in W35 of 2019, comparing users who signed up in 2018 and 2019
Growth is defined as monthly acquisition in 2018 - monthly acquisition in 2019, divided by monthly acquisition in 2018.
Concluding thoughts:
I won't know until I see rest of the data, but I imagine growth would be higher in 2020 due to COVID. The key metric will most likely be retention to sustain the growth (if good growth is seen in 2020). It might be worth exploring new strategies for user acquisition. Assuming user acquisition strategy in 2018 and 2019 were the similar, the drop in growth in 2019 could be: What worked in 2018 didn't work as well in 2019, whether it be due to market competition or shift in user preference / behavior in 2019.